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COMMENTS  ON  PRESENTATION  BY  PAUL  COX 


Walter  T.  Federer 

Mathematics  Research  Center,  United  States  Army- 
University  of  Wisconsin,  Madison,  Wisconsin 

The  paper  presented  by  Mr.  Cox  is  written  in  a somewhat  provocative 
manner.  I appreciate  this  style  of  presentation  as  it  affords  the  Panel 
ample  opportunity  to  illustrate  several  statistical  points. 

The  first  point  I wish  to  make  relates  to  the  definition  and  use  of  terms 
in  current  statistical  literature.  There  is  a tendency  in  statistical  litera- 
ture for  vague  and  imprecise  usage  of  such  terms  as  the  design  of  experi- 
ments, analysis  of  variance,  error  rate,  etc.  It  is  instructive  and  useful 
to  define  and  to  use  words  or  phrases  in  a specified  manner.  Any  departure 
from  specificity  should  be  described.  Personally,  I would  prefer  to  use 
definitions  of  the  following  form; 

i)  Experimental  design  (or  experiment  design)  - The  arrangement  of 
the  observations  in  the  experimental  area  or  space  or  the  procedure  for 
obtaining  the  observations  in  an  experiment. 

ii)  Treatment  design  - The  arrangement  or  selection  of  treatments  for 
the  experiment  (e.  g.  , the  selection  of  levels  and  combinations  of  factors 
in  factorial  experiments,  etc.) 

iii)  Determination  of  sample  size  - The  number  of  observations  necessary 
to  achieve  a prescribed  objective.  (Authors  of  some  ranking  procedures 
papers  refer  to  the  determination  of  numbers  of  observations  as  the  design 
of  experiment  rather  than  as  the  determination  of  sample  size.  ) 

iv)  Analysis  of  variance  - The  partitioning  of  the  sum  of  squares  into 
component  parts.  (One  segment  of  statistical  literature  utilizes  the  term 
analysis  of  variance  to  be  synonymous  with  an  F test  while  another  seg- 
ment utilizes  this  term  to  refer  to  the  estimation  of  variance  components 
and  so  it  goes . ) 

v)  Analysis  of  experimental  data  - This  term  includes  the  last  above 
but  not  vice  versa.  It  refers  to  all  statistical  computations  relevant  to  a 
set  of  experimental  data.  An  analysis  of  experimental  data  refers  to  the 
reduction  of  data  to  summary  form  and  is  useful  in,  but  does  not  replace, 
the  interpretation  of  experimental  results.  The  interpretation  of  statistical 
results  must  be  made  in  light  of  the  objectives,  conditions,  and  related 
circumstances  of  the  experimental  results. 
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vi)  Significance  level  - Type  I error  = size  of  the  test  = a,  have  all  been 
used  to  refer  to  the  same  thing  but  unfortunately  nothing  is  said  about  the 
base  for  computing  "a". 

vii)  Valid  estimate  of  the  error  variance  - Fisher  has  defined  this  term 
but  unfortunately  many  statis|ical  writers  by-pass  this  important  concept 
with  the  phrase  "given  that  <r  is  the  error  variance.  n In  much  of  experi- 
mentation the  definition  of  error  variance  cannot  be  so  glibly  by-passed, 
but  requires  a thorough  knowledge  of  the  experimental  conditions. 

We  could  go  on  with  other  terms  but  now  let  us  return  to  Mr.  Cox's 
paper.  The  title  of  the  paper  is  "Statistical  Design  of  Experiment  for 
Continuous  Data0;  it  deals  only  with  the  analysis  of  experimental  results 
with  no  reference  either  to  the  experimental  or  treatment  design  as 
defined  above.  Mr.  Hartley  has  discussed  some  considerations  to  be 
given  to  the  treatment  design  for  experiments  with  specified  objectives. 

Mr.  Lucas  will,  I hope,  make  some  comments  about  the  actual  experimental 
design  used  in  this  study  and  illustrate  where  confounding  has  taken  place. 

Mr.  Cox's  paper  is  concerned  with  what  to  do  with  a set  of  data  and  not  with 
how  to  obtain  the  data.  He  has  raised  a number  of  questions  but  rather  than 
address  myself  to  the  specific  question  I prefer  to  proceed  in  another  manner 
which,  I hope,  will  furnish  answers  to  or  illustrate  the  relevance  of  the 
questions. 

As  Messrs.  Grubbs,  Greenberg,  Hartley,  and  Schneiderman  have 
already  stressed  we  must  first  set  up  a Mathematical  Model  for  the  data 
which  will  be  consistent  with  the  experimental  and  treatment  designs  and 
with  the  nature  and  objectives  of  the  experiment.  For  example,  let  us 
suppose  that  thrust  = y,  may  be  characterized  by  the  following: 


y = f(e,  t,  0) 


where  the  response  variable  y is  a function  of  error  components  denoted 
by  the  vector  e,  of  time  components  denoted  by  the  vector  t,  and  of  a 
set  of  parameters  denoted  by  the  vector  9,  Our  first  job  then  is  to  define 
to  nature  of  the  function.  If  we  are  totally  ignorant  of  the  response  curve 
then  we  could  use  a form  of  polynomial  regression  as  follows: 


E(y)  = S 
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where  (3.  is  the  i^  regression  coefficient  and  t the  time  variable.  After 

we  are  satisfied  that  a suitable  mathematical  formulation  of  the  problem 
has  been  made,  the  parameters  of  the  response  curve  are  estimated.  The 
analysis  of  the  estimates  may  be  made  using  the  results  of  R.  A.  Fisher 
(Jour.  Agric.  Sci.  11:107,  1921  and  Phil.  Trans.  Roy.  Soc.  B,  213:89,  1925) 
and  others.  Also,  multivariate  analysis  procedures  may  be  pursued  for 
summarizing  the  results  for  many  estimates  of  a set  of  parameters.  For 
example,  if  it  is  desired  to  discriminate  between  response  curves,  then  an 
a priori  or  an  a.posteriori  (These  terms  are  not  reserved  solely  for  use  by 
Bayesians.  ) weighting  of  coeffients  in  the  discriminate  function  may  be 
utilized. 

As  a part  of  the  characterization  of  the  model  and  of  the  problem  it 
should  be  determined  if  the  total  response  curve  segements  of  the  total 
curve,  or  specified  points  (e.  g.  points  of  inflection)  on  the  curve  are  of 
interest.  After  this  has  been  specified  then  the  statistician  proceeds  with 
the  estimation  problems.  Haziness  on  form  or  type  of  response  desired 
leads  to  a confusion  of  issues. 

One  specific  question  raised  by  Mr.  Cox  related  to  the  sample  size  N 
for  response  curves  for  continuous  data.  Now  if  the  data  are  truly 
continuous  N = infinity,  but  we  all  know  that  the  recording  machine  records 
an  impulse  over  a measurable  period  of  time,  say  one-tenth  of  a second. 

In  any  event  N is  very  large.  Several  of  the  previous  Panel  speakers 
have  discussed  the  non-independence  of  two  successive  impulses  or  record- 
ings by  a recording  machine.  However,  I wonder  about  the  relevance  of 
this  since  we  use,  or  should  use,  these  values  only  to  estimate  the  parameters 
in  the  response  curve.  This  procedure  is,  or  should  be,  repeated  for  many 
response  curves  and  the  variation  among  response  curves  treated  alike 
forms  a basis  for  the  variances  and  covariances  among  the  estimates  of 
parameters  where  each  response  curve  represents  but  one  observation. 

At  this  point  I do  not  see  the  importance  of  obtaining  a variance  of  a 
single  response  curve.  However,  if  such  is  desired,  then  as  an  approxi- 
mation I would  suggest  segmentation  of  the  total  curves  into  small  segments 
of  time  where  small  is  such  that  the  estimates  are  relatively  unaffected  by 
smaller  segmentation.  Course  groupings  could  affect  the  results  consider- 
ably. Some  account  may  need  to  be  taken  of  the  relationship  among  adjoin- 
ing segments  as  described  by  Messrs.  Greenberg  and  Hartley. 
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The  response  curves  presented  in  the  paper  bother  me  somewhat. 
Frankly,  I believe  (i)  that  the  curves  in  Figure  3 are  not  very  fictitious, 

(ii)  that  the  area  under  each  curve  is  relatively  constant  from  the  conserva- 
tion of  mass  theory,  (iii)  that  a heart-to-heart  talk  with  the  physicists  and 
engineers  would  do  much  to  simplify  the  nature  of  the  problem,  and 
(iv)  that  maybe  Mr.  Cox  should  be  considering  acceleration  = z instead  of 
thrust  = y. 


Summed  up  this  means  that  1 would  want  some  education  in  this  area 
before  any  analyses  would  be  performed  on  thrust  or  any  other  data.  It 
may  be  possible  to  reparameterize  the  problem  by  using  a function  of  the 
time  variable  instead  of  the  time  variable  itself.  Some  simple  function 
such  as  log  t might  suffice. 


